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The completion of low rank matrices from few entries is a task with many practical applications. 
We consider here two aspects of this problem: detectability, i.e. the ability to estimate the rank r 
reliably from the fewest possible random entries, and performance in achieving small reconstruction 
error. We propose a spectral algorithm for these two tasks called MaCBetH (for Matrix Completion 
with the Bethe Hessian). The rank is estimated as the number of negative eigenvalues of the 
Bethe Hessian matrix, and the corresponding eigenvectors are used as initial condition for the 
minimization of the discrepancy between the estimated matrix and the revealed entries. We analyze 
the performance in a random matrix setting using results from the statistical mechanics of the 
Hopfield neural network, and show in particular that MaCBetH efficiently detects the rank r of a 
large n x m matrix from C(r)r^/nm entries, where C(r) is a constant depicted in Fig. [ 2 ] We also 
evaluate the corresponding root-mean-square error empirically and show that MaCBetH compares 
favorably to other existing approaches. 


I. INTRODUCTION 

Matrix completion is the task of inferring the missing entries of a matrix given a subset of known entries. Typically, 
this is possible because the matrix to be completed has (at least approximately) low rank r. This problem has 
witnessed a burst of activity, see e.g. m, motivated by many applications such as collaborative filtering [T], quantum 
tomography [3] in physics, or the analysis of a covariance matrix [T] .A commonly studied model for matrix completion 
assumes the matrix to be exactly low rank, with the known entries chosen uniformly at random and observed without 
noise. The most widely considered question in this setting is how many entries need to be revealed such that the 
matrix can be completed exactly in a computationally efficient way mm- While our present paper assumes the same 
model, the main questions we investigate are different. 

The first question we address in this paper is detectability , i.e. how many random entries do we need to reveal in 
order to be able to estimate the rank r reliably. This question is motivated by the more generic problem of detecting 
structure (in our case, low rank) hidden in partially observed data. It is reasonable to expect the existence of a region 
where exact completion is hard or even impossible yet the rank estimation is tractable. A second question we address 
is what is the minimum achievable root-mean-square error (RMSE) in estimating the unknown elements of the matrix. 
In practice, even if exact reconstruction is not possible, having a procedure that provides a very small RMSE might 
be quite sufficient. 

In this paper we propose an algorithm called MaCBetH that gives the best known empirical performance for the two 
tasks above when the rank r is small. The rank in our algorithm is estimated as the number of negative eigenvalues 
of an associated Bethe Hessian matrix EDO, and the corresponding eigenvectors are used as an initial condition for 
the local optimization of a cost function commonly considered in matrix completion (see e.g. pi]). In particular, 
in the random matrix setting, we show that MaCBetH detects the rank of a large n x m matrix from C{r)r^/nm 
entries, where C(r) is a small constant, see Fig. [ 2 ] and C(r) —> 1 as r —> 00 . The corresponding RMSE is evaluated 
empirically, and in the regime close to it compares very favorably to existing approaches, in particular to 

OptSpace [3]. 

This contribution is organized as follows. First, in Sec. ^we define the problem and present generally our approach 
in the context of existing works. In Sec. |III| we describe our algorithm and motivate its construction via a spectral 
relaxation of the Hopfield model of neural network. Next, in Sec. |IV| we show how the performance of the proposed 
spectral method can be analyzed using, in parts, results from spin glass theory and phase transitions, and rigorous 
results on the spectral density of large random matrices. Finally, in Sec. |^we present numerical simulations that 
demonstrate the efficiency of MaCBetH. 
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II. PROBLEM DEFINITION AND RELATION TO OTHER WORKS 

Let ,M true be a rank-r matrix such that 


M trne = X y f } (!) 

where X £ K nxr and Y £ R mxr are two (unknown) tall matrices. We observe only a small fraction of the elements of 
A! true , chosen uniformly at random. We call E the subset of observed entries, and A4 the (sparse) matrix supported on 
E whose nonzero elements are the revealed entries of A! tme . The aim is to reconstruct the rank r matrix Af true = XY' 
given A4. An important parameter which controls the difficulty of the problem is e = \E\/y/nm. In the case of a 
square matrix Xi, this is the average number of revealed entries per line or column. 

In our numerical examples and theoretical justifications we shall generate the low rank matrix AI true = XY' , using 
tall matrices X and Y with iid Gaussian elements, we call this the random matrix setting. The MaCBetH algorithm 
is, however, non-parametric and does not use any prior knowledge about X and Y. The analysis we perform applies 
to the limit n — > oo while m/n = a = 0(1) and r = 0(1). 

The matrix completion problem was popularized in [T] who proposed nuclear norm minimization as a convex 
relaxation of the problem. The algorithmic complexity of the associated semidefinite programming is, however, 
0(n 2 m 2 ). A low complexity procedure to solve the problem was later proposed by [7] and is based on singular value 
decomposition (SVD). A considerable step towards theoretical understanding of matrix completion from few entries 
was made in [3] who proved that with the use of trimming the performance of SVD-based matrix completion can be 
improved and a RMSE proportional to yjnr/\E\ can be achieved. The algorithm of [31 is referred to as OptSpace, 
and empirically it achieves state-of-the-art RMSE in the regime of very few revealed entries. 

OptSpace proceeds in three steps |3j. First, one trims the observed matrix Xi by setting to zero all rows (resp. 
columns) with more revealed entries than twice the average number of revealed entries per row (resp. per column). 
Second, a singular value decompositions is performed on the matrix and only the first r components are kept. When 
the rank r is unknown it is estimated as the index for which the ratio between two consecutive singular values has 
a minimum. Third, a local minimization of the discrepancy between the observed entries and a low-rank estimate is 
performed. The initial condition for this minimization is given by the first r left and right singular vectors from the 
second step. 

In this work we improve upon OptSpace by replacing the first two steps by a different spectral procedure that 
detects the rank and provides a better initial condition for the discrepancy minimization. Our method leverages on 
recent progress made in the task of detecting communities in the stochastic block model GDIS] with spectral methods. 
Both in community detection and matrix completion, traditional spectral methods fail in the very sparse regime due 
to the existence of spurious large eigenvalues (or singular values) corresponding to localized eigenvectors [3, [5]. The 
authors of Mi showed that using the non-backtracking matrix or the closely related Bethe Hessian as a basis for 
the spectral method in community detection provides reliable rank estimation and better inference performance. The 
present paper provides an analogous improvement for the matrix completion problem. In particular, we shall analyze 
the algorithm using tools from spin glass theory in statistical mechanics, and show that there exists a phase transition 
between a phase where it is able to detect the rank, and a phase where it is unable to do so. 


III. ALGORITHM AND MOTIVATION 


A. The MacBetH algorithm 


A standard approach to the completion problem (see e.g. 0) is to minimize the cost function 

mm £ [Mij - {XY%} 2 (2) 

(ij)eE 

over X £ M nxr and Y £ R mxr . This function is non-convex, and global optimization is hard. One therefore resorts to 
a local optimization technique with a careful choice of the initial conditions Xq , Yq. In our method, given the matrix 
Xi, we consider a weighted bipartite undirected graph with adjacency matrix A £ flj( n + m ) x O+m) 


A = 


0 M 
M t 0 


(3) 
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We will refer to the graph thus defined as Q. We now define the Bethe Hessian matrix H(/3) £ jjj(ra+m)x(n+m) 
the matrix with elements 


Hij(fi) = ( 1 + sinh2 P Aik ) ^3 ~ \ sinll ( 2 /3A?) . (4) 

V k£di / 

where f3 is a parameter that we will fix to a well-defined value /3sg depending on the data, and di stands for the 
neighbors of i in the graph Q. The MaCBetH algorithm that is the main subject of this paper is then, given the 
matrix A, which we assume to be centered: 


Algorithm (MaCBetH) 


1. Numerically solve for the value of (3sg such that 


F0sg) = 



tanh 2 (/3 S GA4.y) = 1. 

(hf)e-E 


(5) 


2. Build the Bethe Hessian H(/3sg) following eq. (jd]). 

3. Compute all its negative eigenvalues Ai, • • • , A r - and corresponding eigenvectors v\, ■ ■ ■ ,Vf- f is our estimate for 
the rank r. Set Xo (resp. Yo) to be the first n lines (resp. the last m lines) of the matrix [v\ i >2 • • • v r ~}. 

4. Perform local optimization of the cost function <§ with rank f and initial condition Xo,Yo- 

The function F , in the first step, being an increasing function of /3, /3gc can be found efficiently, e.g. by dichotomy. 
Alternatively, [3sg i n step 1 can be tuned in such a way that the number of negative eigenvalues of the Bethe Hessian 
is the largest possible. In step 2 we could also use the non-backtracking matrix weighted by tanh/3A4y', it was shown 
in [5] that the spectrum of the Bethe Hessian and the non-backtracking matrix are closely related. In the next section, 
we will motivate and analyze this algorithm (in the setting where A4 true was generated from elements-wise random X 
and Y) and show that in this case MaCBetH is able to infer the rank whenever e > e c . Fig. [l] illustrates the spectral 
properties of the Bethe Hessian that justify this algorithm: the spectrum is composed of a few informative negative 
eigenvalues, well separated from the bulk (which remains positive). This algorithm is computationally efficient as it 
is based on the eigenvalue decomposition of a sparse, symmetric matrix. 


B. Motivation from a Hopfield model 

We shall now motivate the construction of the MaCBetH algorithm from a graphical model perspective and a 
spectral relaxation. Given the observed matrix Ai from the previous section, we consider the following graphical 
model 




^ XP 


2^ 

(i,j)eE 


Sjtn 


( 6 ) 


where the {si}i<j< n and ar e binary variables, and f3 is a parameter controlling the strength of the inter¬ 

actions. This model is a (generalized) Hebbian Hopfield model |ljQ] on a bipartite sparse graph. To study it, we can 
use the standard Bethe approximation which is widely believed to be exact for such problems on large random graphs 
mm- In this approximation the means E(si),E(t :) ) and moments E(sifj) of each variable are approximated by the 
parameters bi,Cj and f.y that minimize the so-called Bethe free energy FBethe({fri}? {}, {£iy}) that reads 


-pBethe({fri}> { c j}) {£ij}) — 2 ^ ^ ^ 


1 —)— biSi —t - Cjtj —t - £ij Sitj 


B 1 - + E(‘ - ■ 


(i,j)eE Si ,tj 

1 “b Cjtj 


i= 1 


3 =1 


( 7 ) 


where rj(x) := x\nx, and di,dj are the degrees of nodes i and j in the graph Q. 
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FIG. 1: Spectral density of the Bethe Hessian for various values of the parameter /3. The red dots are the result of the direct 


diagonalisation of the Bethe Hessian for a rank r = 5 and n = m = 10 
average. 


matrix, with e = 15 revealed entries per row on 
The black curves are the solutions to the recursion (181 computed with belief propagation on a graph of size 10 5 . We 


isolated the 5 smallest eigenvalues, represented as small bars for convenience, and the inset is a zoom around these smallest 
eigenvalues. For /? small enough (top plots), the Bethe Hessian is positive definite, signaling that the paramagnetic state (|8| 
is a local minimum of the Bethe free energy. As (5 increases, the spectrum is shifted towards the negative region and has 5 
negative eigenvalues at the approximate value of /3sg = 0.12824 (to be compared to /3r = 0.0832 for this case) evaluated by our 
algorithm (lower left plot). These eigenvalues, corresponding to the retrieval states become positive and eventually merge 
in the bulk as ft is further increased (lower right plot), while the bulk of uninformative eigenvalues remains at all values of /3 
in the positive region. 


Neural networks models such as eq. © have been extensively studied over the last decades (see e.g. mm and 
references therein) and the phenomenology, that we shall review briefly here, is well known. In particular, for /3 small 
enough, the global minimum of the Bethe free energy corresponds to the so-called paramagnetic state 


Vi, j, bi = Cj = 0, £ij = tanh(/ 3M ZJ ). (8) 

As we increase /3, above a certain value /3 r, the model enters a retrieval phase, where the free energy has local minima 
correlated with the factors X and Y. There are r local minima, called retrieval states ({b\}, {cj}, {£? •}) indexed by 
Z = 1, - - - , r such that, in the large n, m limit, 


\/l = 1 • • • r, 


1 " 
m z —* 


>o, 


m 1 


XX >°- 


0 =1 


(9) 


These retrieval states are therefore well-suited as initial conditions for the local optimization of eq. ©> and we expect 
their number to tell us the correct rank. Increasing /3 above a critical value /?sg the system eventually enters a spin 
glass phase, marked by the appearance of many spurious minima. 

It would be tempting to continue the Bethe approach and to derive the belief propagation equations, but we shall 
here instead consider a simpler spectral relaxation of the problem, following the same strategy as used in [5j [B] for 
graph clustering. First, we use the fact that the paramagnetic state ^ is always a stationary point of the Bethe free 
energy, for any value of /3 na ng. In order to detect the retrieval states, we thus study its stability by looking for 
negative eigenvalues of the Hessian of the Bethe free energy evaluated at the paramagnetic state ([8]). At this point, 
the elements of the Hessian involving one derivative with respect to £,j vanish, while the block involving two such 
derivatives is a diagonal positive definite matrix E m- Tlie remaining part is the matrix called Bethe Hessian in E, 
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and eigenvectors corresponding to its negative eigenvalues are thus expected to give an approximation of the retrieval 
states Q . The picture exposed in this section is summarized in figure [lj This motivates the use of the MaCBetH 
algorithm. 

Note that a similar approach was used in m to detect the retrieval states of a Hopfield model using the weighted 
non-backtracking matrix fS], which linearizes the belief propagation equations rather than the Bethe free energy, 
resulting in a larger, non-symmetric matrix. The Bethe Hessian, while mathematically closely related, is also simpler 
to handle in practice. 


IV. ANALYSIS OF PERFORMANCE IN DETECTION 

We now show how the performance of MaCBetH can be analyzed, and the spectral properties of the matrix 
characterized using both tools from statistical mechanics and rigorous arguments. 


A. Analysis of the phase transition 


We start by investigating the phase transition above which our spectral method will detect the correct rank. Let 
x p = (Xp)i<i<r,y P = (y l p )i<i<r be random vectors with the same empirical distribution as the lines of X and Y 
respectively. Using the statistical mechanics correspondence between the negative eigenvalues of the Bethe Hessian 
and the appearance of phase transitions in model (JgJ) , we can compute the values /3r and 8sg where instabilities 
towards, respectively, the retrieval states and the spurious glassy states, arise. We have repeated the computations of 
D3HH! in the case of model using the cavity method m, We refer the reader interested in the technical details 
of the statistical mechanics approach to neural networks to nns]. 

Following a standard computation for locating phase transitions in the Bethe approximation (see e.g. |12l Sj). 
the stability of the paramagnetic state ^ towards these two phases can be monitored in terms of the two following 
parameters: 


A(/3) - hm E [n tanh 2 [f}2x l p y l ^j tanh 2 (/3^4 +1 y^) J 2s , 

P= 1 1 = 1 1 = 1 

s r r 

M(/3) = Hm e\JJ tanh + 8J2 x Wp) tanh {P\ x l+ iVp\ + ^Z^p+^p) 

p = 1 1=2 1=2 


( 10 ) 

(ii) 


where the expectation is over the distribution of the vectors x pi y p . The parameter A(/3) controls the sensitivity of 
the paramagnetic solution to random noise, while ft(/3) measures its sensitivity to a perturbation in the direction of 
a retrieval state. (3sg and 8r are defined implicitly as cA(/3sg) = 1 and e/r(/3ij) = 1, i.e. the value beyond which the 
perturbation diverges. The existence of a retrieval phase is equivalent to the condition 8sg > 8r, so that there exists 
a range of values of /? where the retrieval states exist, but not the spurious ones. If this condition is met, by setting 
8 = 8sg in our algorithm, we ensure the presence of meaningful negative eigenvalues of the Bethe Hessian. 

We define the critical value of e = e c such that ftsG > 8r if and only if e > e c . In general, there is no closed-form 
formula for this critical value, which is defined implicitly in terms of the functions A and /i. We thus computed e c 
numerically using a population dynamics algorithm HU and the results for C(r) = e c /r are presented on Figure [2] 
Quite remarkably, with the definition e = \E\/y/nm, the critical value e c does not depend on the ratio m/n , only on 
the rank r. 

In the limit of large e and r it is possible to obtain a simple closed-form formula. In this case the observed entries of 
the matrix become jointly Gaussian distributed, and uncorrelated, and therefore independent. Expression (101 then 
simplifies to 


A(/3) ~ E 

r—> oo 


tanh 2 y l ~) 

;=l 


( 12 ) 


We stress that the MaCBetH algorithm uses an empirical estimator © of this quantity to compute an approximation 
8sg of 8sg purely from the revealed entries. 

In the large r, e regime, both fisGi 8 r decay to 0, so that we can further approximate 


1 = eA(/3 S c) ~ er/^ G E[x 2 ]E[?/ 2 ], 

r,e—> oo 

(13) 

1 = eMAi) ~ e/3 RV /E[a; 2 ]E[?/ 2 ], 

r,e—>■ oo 

(14) 
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FIG. 2: Location of the critical value as a function of the rank r. MaCBetH is able to estimate the correct rank from 
\E\ > C(r)ry/nm known entries. We used a population dynamics algorithm with a population of size 1 x 10 6 to compute the 

. The dotted line is a fit suggesting that C(r) — 1 = 0(r -3 / 4 ). 


functions A and fi from ( 10| 11 


so that we reach the simple asymptotic expression, in the large e,r limit, that e c = r, or equivalently C(r ) = 1. It is 
interesting to note that this result was obtained as the detectability threshold in completion of rank r = 0(n ) matrices 
from 0(n 2 ) entries in the Bayes optimal setting in |20| . Notice, however, that exact completion in the setting of |2tT 
is only possible for e > r(m + n) /yjnm : clearly detection and exact completion are different phenomena. 


B. Computation of the spectral density 

In this section, we show how the spectral density of the Bethe Hessian can be computed analytically on tree-like 
graphs such as those generated by picking uniformly at random the observed entries of the matrix XY'. The spectral 
density is defined as 


v(\) 


1 n+m 

— — y 

n,m->oo n + m z ' 
i= 1 


lim 


S(X - A») 


(15) 


where the Aj’s are the eigenvalues of the Bethe Hessian. Using again the cavity method, It can be shown m that 
the spectral density (in which potential delta peaks have been removed) is given by 


n+m 


/(A) = lim 


n,m—too 7 r(n + m) 


ImAj(A), 


where the A,; are complex variables living on the vertices of the graph Q, which are given by: 

1 \-i 

' L 

k£di l£di 


A i= (-A + l+ ^ sinh 2 /3A ik - sinh 2 (2/3+)A;_,., 


(16) 


(17) 


where di is the set of neighbors of i. The are the (linearly stable) solution of the following belief propagation 

recursion: 

A = (- A + l+ ^ sinh 2 f3A lk - 'Y j sinh 2 (2 / 3H i/ )A i _ >i ^ . (18) 

k£di l£di\j 
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Rank 3 Rank 10 



FIG. 3: Mean inferred rank as a function of e, for different sizes. Each point is averaged over 100 samples of matrices XY* of 
size nx m, with the entries of A', Y drawn from a Gaussian distribution of mean 0 and variance 1. The theoretical transition is 
computed with a population dynamics algorithm (see section IV AI. The finite size effects are considerable but consistent with 
the asymptotic prediction. 


The ingredients to derive this formula are to turn the computation of the spectral density into a marginalization 
problem for a graphical model on the graph Q , and then write the belief propagation equations to solve it. Quite 
remarkably, it has been shown |22] that this approach leads to an asymptotically exact (and rigorous) description of 
the spectral density on Erdos-Renyi random graphs, which are locally tree-like in the limit where n, m —» oo. We can 
again solve equation (181 numerically using the belief propagation algorithm. The results are shown on Fig. [lj the 
bulk of the spectrum is always positive. 

We now demonstrate that for any value of j3 < /3sg> there exists an open set around A = 0 where the spectral 
density vanishes. This justifies independently or choice for the parameter 8. The proof follows [5; and begins 
by noticing that A= cosh~ 2 (/3A,y) is a fixed point of the recursion (181 for A = 0. Since this fixed point 

of this solution such that 
2 / 


is real, the corresponding spectral density is 0. Now consider a small perturbation <5*. 


= cosh 


- 2 / 


(/?Ay)(l + cosh (f3Aij)6ij). The linearized version of (18) writes S^j = X^ediV/tanh (PAu)5i^i . 
The linear operator thus defined is a weighted version of the non-backtracking matrix of [5] • Its spectral radius is 
given by p = eA(/3), where A is defined in 10 In particular, for /3 < /3 sg> P < 1, so that a straightforward application 
[5] of the implicit function theorem allows to show that there exists a neighborhood U of 0 such that for any A £ U, 
there exists a real, linearly stable fixed point of (18), yielding a spectral density equal to 0. 


V. NUMERICAL TESTS 


The algorithm was implemented in Julia [23], using the NLopt optimization package [24] for the minimization of 
the discrepancy ©• A matlab demo using the implementation of the limited-memory BFGS algorithm of |25j is also 
available. Both demos can be downloaded from [2Bj. 

Figure [3] illustrates the ability of the Bethe Hessian to infer the rank above the critical value e c in the limit of large 
size n, m (see section [IV A I. In Figure |4j we demonstrate the suitability of the eigenvectors of the Bethe Hessian as 
starting point for the minimization of the cost function ©• We compare the final RMSE achieved on the reconstructed 
matrix XY' with 4 other initializations of the optimization, including the largest singular vectors of the trimmed 
matrix M [3]. MaCBetH systematically outperforms all the other choices of initial conditions. Remarkably, the 
performance achieved by MaCBetH with the inferred rank is essentially the same as the one achieved with an oracle 
rank. By contrast, estimating the correct rank from the (trimmed) SVD is more challenging. We note that for the 
choice of parameters we consider, trimming had a negligible effect. Along the same lines, OptSpace [3] uses a different 






















Rank 3 


Rank 10 



FIG. 4: RMSE as a function of the number of revealed entries per row e: comparison between different initializations for the 
optimization of the cost function |2|. The top row shows the probability that the achieved RMSE is smaller than 10 _1 , while 
the bottom row shows the probability that the final RMSE is smaller than 10~ 8 . The probabilities were estimated as the 
frequency of success over 100 samples of matrices XY' of size 10000 x 10000, with the entries of X, Y drawn from a Gaussian 
distribution of mean 0 and variance 1. All methods optimize the cost function § using a limited-memory BFGS algorithm m 
part of NLopt [24j . starting from different initial conditions. The maximum number of iterations was set to 1000. The initial 
conditions compared are MaCBetH with oracle rank (MaCBetH OR) or inferred rank (MaCBetH IR), SVD of the observed 
matrix M after trimming, with oracle rank (Tr-SVD OR), or inferred rank (Tr-SVD IR, note that this is equivalent to OptSpace 
[3] in this regime), and random initial conditions with oracle rank (Random OR). For the Tr-SVD IR method, we inferred the 
rank from the SVD by looking for an index for which the ratio between two consecutive eigenvalues is minimized, as suggested 
in [28]. 


minimization procedure, but from our tests we could not see any difference in performance due to that. 


VI. CONCLUSION 

In this paper, we have presented MaCBetH, an algorithm for matrix completion that is efficient for two distinct, 
complementary, tasks: (i) it has the ability to estimate a finite rank r reliably from fewer random entries than 
other existing approaches, and (ii) it gives lower root-mean-square reconstruction errors than its competitors. The 
algorithm is built around the Bethe Hessian matrix and leverages both on recent progresses in the construction of 
efficient spectral methods for clustering of sparse networks ISM, and on the OptSpace approach [3] for matrix 
completion. Demos in Julia and matlab are available for download [2Bi. 

The method presented here offers a number of possible future directions, including replacing the minimization of the 
cost function by a message-passing type algorithm, the use of different neural network models, or a more theoretical 
direction involving the computation of information theoretically optimal transitions for detectability. 
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